ECUE: A Spam Filter that Uses Machine Leaming to Track Concept Drift

نویسندگان

  • Sarah Jane Delany
  • Padraig Cunningham
  • Barry Smyth
چکیده

While text classification has been identified for some time as a promising application area for Artificial Intelligence, so far few deployed applications have been described. In this paper we present a spam filtering system that uses example-based machine learning techniques to train a classifier from examples of spam and legitimate email. This approach has the advantage that it can personalise to the specifics of the user’s filtering preferences. This classifier can also automatically adjust over time to account for the changing nature of spam (and indeed changes in the profile of legitimate email). A significant software engineering challenge in developing this system was to ensure that it could interoperate with existing email systems to allow easy managment of the training data over time. This system has been deployed and evaluated over an extended period and the results of this evaluation are presented here.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Case-Based Reasoning for Spam Filtering

Spam is a universal problem with which everyone is familiar. Figures published in 2005 state that about 75% of all email sent today is spam. In spite of significant new legal and technical approaches to combat it, spam remains a big problem that is costing companies meaningful amounts of money in lost productivity, clogged email systems, bandwidth and technical support. A number of approaches a...

متن کامل

Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering

In this paper, we compare case-based spam filters, focusing on their resilience to concept drift. In particular, we evaluate how to track concept drift using a case-based spam filter that uses a featurefree distance measure based on text compression. In our experiments, we compare two ways to normalise such a distance measure, finding that the one proposed in [1] performs better. We show that a...

متن کامل

Applying lazy learning algorithms to tackle concept drift in spam filtering

A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to adapt to local concept drift. Concept ...

متن کامل

Personalised, Collaborative Spam Filtering

The state of the art sees content-based filters tending towards collaborative filters, whereby email is filtered at the MTA with users feeding information back about false positives and negatives. While this improves the ability of the filter to track concept drift in spam over time, such approaches make assumptions implicit in centralised spam filtering, such as that all users consider the sam...

متن کامل

Feature-Based and Feature-Free Textual CBR: A Comparison in Spam Filtering

Spam filtering is a text classification task to which CaseBased Reasoning (CBR) has been successfuly applied. We describe the ECUE system, which classifies emails using a feature-based form of textual CBR. Then, we describe an alternative way to compute the distances between cases in a feature-free fashion, using a distance measure based on text compression. This distance measure has the advant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006